KMID : 1132720120100010044
|
|
Genomics & Informatics 2012 Volume.10 No. 1 p.44 ~ p.50
|
|
Efficient Mining of Interesting Patterns in Large Biological Sequences
|
|
Rashid Mamunur
Karim Rezaul Jeong Byeong-Soo Choi Ho-Jin
|
|
Abstract
|
|
|
Pattern discovery in biological sequences (e.g., DNA sequences) is one of the most challenging tasks in computational biology and bioinformatics. So far, in most approaches, the number of occurrences is a major measure of determining whether a pattern is interesting or not. In computational biology, however, a pattern that is not frequent may still be considered very informative if its actual support frequency exceeds the prior expectation by a large margin. In this paper, we propose a new interesting measure that can provide meaningful biological information. We also propose an efficient index-based method for mining such interesting patterns. Experimental results show that our approach can find interesting patterns within an acceptable computation time.
|
|
KEYWORD
|
|
DNA sequence, index-based method, information gain, pattern mining
|
|
FullTexts / Linksout information
|
|
|
|
Listed journal information
|
|
|